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ABSTRACT 

We consider an input x generated by an unknown sta- 
tionary ergodic source X that enters a signal processing 
system J, resulting in w = J{x). We observe w through 
a noisy channel, y = z(w); our goal is to estimate x 
from y, J, and knowledge of fY\w- This is universal 
estimation, because fx is unknown. We provide a for- 
mulation that describes a trade-off between information 
complexity and noise. Initial theoretical, algorithmic, 
and experimental evidence is presented in support of our 
approach. 

1. EVTRODUCTION 

Universal algorithms [1-5] achieve the best possible per- 
formance asymptotically - without knowing the input 
statistics. These algorithms have had tremendous im- 
pact in lossless compression, which is crucial for data 
backups and transmissions. In sharp contrast, universal 
algorithms have made much less impact in other areas. 

Estimation algorithms attempt to recover an input 
from noisy measurements (Figure 1). Numerous esti- 
mation problems have received great attention including 
the additive noise scalar channel, y ~ x + z [6]; Unear 
matrix multiplication with additive noise, y = Jx + z 
with applications including compressed sensing [7-9], 
finance, medical and seismic imaging; universal lossy 
compression [4,5, 10], where the goal is to find com- 
pressible X that is sufficiently close to y; nonlinear re- 
gression, where J{x) is nonlinear; and distributed signal 
processing. 

In these estimation problems, the common goal is 
to estimate the input x from knowledge of the noisy 
measurements y and measurement system J. To do so, 
we must exploit all statistical structure in x. A particu- 
larly challenging type of statistical structure is the ap- 
pearance of spatial or temporal dependencies in data. 
In images, such dependencies can be captured by dic- 
tionary learning or employing energy compacting trans- 
forms. In other problems, the statistical dependencies 



might be more subtle. Following the lead of universal 
lossless compression, we assume that the input x was 
generated by an unknown stationary ergodic source X. 
It is well known that stationary ergodic models capture 
the statistics of text files well, and hence the success of 
universal lossless compressors. Stationary ergodic mod- 
els have also been incorporated in speech denoising and 
enhancement, and appear prominently in hidden Markov 
models. 

One approach to universal estimation relies on Kol- 
mogorov complexity [11]. For a prospective x, the Kol- 
mogorov complexity K{x) is the length of the shortest 
computer program that can compute x. Donoho [12] 
proposed a Kolmogorov-based estimator for the white 
scalar channel, y = x + z. Despite related extensions to 
compressed sensing [8,9], what is missing in the liter- 
ature is a universal approach in arbitrary measurement 
systems that would support noise and unknown station- 
ary ergodic input distributions. 

We propose to perform universal estimation in (po- 
tentially nonlinear) signal processing systems from noisy 
measurements. The algorithmic component of our work 
features a harmonious marriage of scalar quantization, 
universal lossless compression, and Markov chain Monte 
Carlo. We evaluate the estimated input x over a quan- 
tized grid and optimize for the trade-off between infor- 
mation complexity (lossless coding length) of x and how 
well x explains the measurements y. We report promis- 
ing preliminary theoretical and numerical results. 

2. INFORMATION COMPLEXITY 
FORMULATION 

We focus on the setting where the lengths M of the 
output y and N of the input x both grow to infinity, 
Af, ^ oo. We further assume that their ratio is finite 
and positive, limTv^oo ^ = i5 > 0. Similar settings 
have been discussed in the literature, e.g., [13]. Since x 
was generated by an unknown source, we must search 
for an estimation mechanism that is agnostic to the spe- 
cific distribution fx- 
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Figure 1 . Measurement and estimation system; An input 
X € R'^ generated by an unknown stationary ergodic source 

X is processed by a known (potentially nonlinear) operator J 
to produce w = J{x) € ffi^. A probabilistic noise operator z 
that implies a known probability density fY\wiy\w = J{x)) 
is applied to w, the measurements are y — z{J{x)). Our goal 
is to estimate x using y G M^^ and J, resulting in a; e . Al- 
though our emphasis is on real-valued w,x,y, discrete-valued 
signals and operators are allowed. 



Kolmogorov complexity: For x G R^, the Kolmogo- 
rov complexity [11] of x, denoted by K{x), is the length 
of the shortest computer program that can compute x. To 
be more precise, K{x) is the length of the shortest input 
to a Turing machine [14] that generates x and then halts. 
We limit our discussion to Turing machines whose "in- 
put tapes" consist of bits. Consider the shortest program 
P{x) that generates x. From the perspective of a source 
encoder [6], we say that V{x) is a code for x. 

Having linked Turing machines [14] and data com- 
pression 1 6], let us temporarily limit the discussion to 
discrete valued x generated by a stationary ergodic source 
X. Each such x is generated with probabihty px{x), 
and it is easily shown that the per-symbol Kolmogorov 
coding length K{x) converges to the entropy rate H al- 
most surely, limAr_yoo jfK{x) = H [6]. Noting that 
universal lossless compressors [1,2] achieve H asymp- 
totically for discrete valued stationary ergodic sources [6], 
we see that these algorithms achieve the per-symbol Kol- 
mogorov complexity almost surely. 

Kolmogorov sampler: For additive white Gaussian 
noise, y = x+z, Donoho [12] proposed the Ab/mogorov 
sampler, 

xks = 3xgm:m.^{K{x) - \og{fz{z = y- x))}. 

For stationary ergodic X, x^s is sampled from the pos- 
terior fx\Y{y\x), where the mean square error, E[(xks— 
x)'^], is twice larger than the Bayesian minimum mean 
square error (MMSE) [12]. 

In a later paper, Donoho et al. discussed a Kol- 
mogorov estimator for compressed sensing y = Jx [8]; 
their estimator ignores noise, and is of limited practi- 
cal interest. For the noisy version of this problem, y = 



Jx + z, Haupt and Nowak [9] described a complexity 
measure that, when optimized, produces the LASSO al- 
gorithm [15]. To the best of our knowledge, Haupt and 
Nowak did not pursue complexity based regularization 
beyond iid signals and additive white Gaussian noise 
(AWGN). 

Quantization and estimation: The overwhelming 
majority of real numbers have infinite Kolmogorov com- 
plexity. Nonetheless, some scalars x G M.^ can be rep- 
resented by a finite length V{x). In practice, it is im- 
possible to compute K{x) even for discrete alphabets. 
At the same time, we have seen that universal lossless 
source codes [1,2] achieve per-symbol Kolmogorov cod- 
ing length almost surely [6]. To represent continuous 
valued x, we apply a scalar quantizer, Q -.x G — > 
x' G , and then compress x' = Q{x) with a uni- 
versal lossless compressor U with coding length U(x'), 
where quantization levels Q C M consist of a finite sub- 
set of M, and performing an optimization over x E 
reduces the complexity of the estimation problem from 
infinite to combinatorial. Note that we generate x' by 
independently quantizing each entry of x with Q. This 
encoder first describes the quantizer Q and then com- 
presses Q{x). The coding length, which we desire to 
minimize, is denoted by U{Q{x)) or U{x). 

It would seem that we must search for a good scalar 
quantizer Q (Section 3), but data-independent repmduc- 
tion levels are of theoretical interest, 
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As N increases, TZ will quantize a broader range of val- 
ues of a; to a greater resolution. An encoder based on TZ 
need not describe the structure of the data-independent 
quantizer, because is known. That is, U{TZ{x)) only 
accounts for the length of the universal code U. 

Universal MAP estimation: We perform maximum 
a posteriori (MAP) estimation over possible sequences 
x e TZ^, where the prior px{x) = 2"^^^^ utilizes the 
coding length U{x) of some universal lossless compres- 
sor [1, 2], 

xmap = arg min {U{x) -log{fY\w{y\w = J{x)))}, 

(1) 

where we note that TZ{x) — x for i e 7^". Our MAP 
estimator is applicable to any signal processing system 
J and supports any probabilistic noise operators, it is 
closely related to universal prediction [2, 3]. 

Estimation performance: We have promising pre- 
liminary theoretical results using the data-independent 
quantizer TZ. In universal lossy source coding of analog 
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(continuous valued) sources [4], we have shown with 
Weissman that xmap (1) achieves the rate distortion 
function for finite variance stationary ergodic sources in 
an appropriate asymptotic sense. That is, U{TZ{x)) of- 
fers a sufficiently good approximation to K{x) in uni- 
versal lossy compression, where we chose U{x) to be 
empirical entropy of blocks of q — 0{\og{N)) sym- 
bols in X. In universal compressed sensing [16], we 
have shown with Duarte that under minor technical con- 
ditions on fx, performing MAP estimation over the dis- 
crete alphabet TZ converges to the MAP estimate over 
the continuous distribution fx asymptotically, where we 
used i.i.d. zero-mean Gaussian noise z G with 
known variance. It remains to be seen whether TZ or 
other data-independent quantizers are useful for arbi- 
trary nonlinear measurement systems. 

In terms of the mean square error, we would ex- 
pect Xmap to perform well in Donoho's scalar chan- 
nel setting, y — x + z. With Duarte [16], we have 
promising results for the compressed sensing (linear ma- 
trix multiplication) channel, y — Jx + z, where we ap- 
proximated Xmap (1) by a Markov chain Monte Carlo 
(MCMC) [17] algorithm (Section 3). Figure 2 illustrates 
recovery results from Gaussian measurement matrices 
for a source with i.i.d. Bernoulli entries with nonzero 
probability of 3%. Our MCMC algorithm outperforms 
£i-norm minimization, which is a well-known compressed 
sensing reconstruction (estimation) algorithm [7], ex- 
cept when the number of measurements M is low. Com- 
paring MCMC to the minimum mean square error (MMSE) 
achievable in the Bayesian regime with known statis- 
tics [13], the square error achieved by MCMC is three 
times larger. One is left to wonder whether the mean 
square error performance of our algorithm might also be 
double the MMSE, particularly in the limit of infinite 
computation (Section 3). 

Taking Kolmogorov beyond MAP: The Kolmogorov 
sampler xks samples from the posterior [12]; it throws 
away all the statistical information it has on signals x 
that differ from xks- Seeing that the mean square error 
obtained by xks is double the MMSE, there is great po- 
tential to reduce estimation error over our Kolmogorov- 
based MAP estimator xmap (!)■ We therefore propose 
Kolmogorov-based conditional expectation, 

XMSE = E[x\J,y] 

E,en-S:-2-^(-^fy\^iy\w^Jix)) 
Exe7^«2-t^(*)/y|v^(y|z« = J(x)) ' 

where we employ the universal prior, px (x) = 2^'^^^*^^^^ . 
It is well known that conditional expectation achieves 
the MMSE of the Bayesian regime, and this estimator 
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Figure 2. Universal Markov chain Monte Carlo 
(MCMC) [16] and li-norm minimization [7] recovery results 
for a source with i.i.d. Bernoulli entries with nonzero proba- 
bility of 3% as a function of the number of Gaussian random 
measurements M for different signal to noise ratio (SNR) val- 
ues. 

should perform well. Interestingly, when the signal to 
noise ratio (SNR) is low, the Bayesian MMSE is siz- 
able, and achieving double the MMSE is unimpressive. 
In these low SNR settings, xmse should estimate much 
better than xmap- 

In some signal processing systems, one wants to min- 
imize some other (not necessarily quadratic) distortion 
metric D{x,x). The universal prior is readily invoked 
by defining the Kolmogorov conditional probability, 

, I . Py\xPx 

PX\Y{x\y) = (XpY\xPX, 

PY 

and taking the minimizing expression gives the Kolmogorov- 
based estimator for _D(-), 

XD = ai-gmml ^ D{x,w)fY\w{y\w = J(x))2-^'^^^ \ 

For scalar channels and iid noise, Sivaramakrishnan and 
Weissman [18] described a universal denoising algorithm 
that estimates x by xsw, its expected error E[D{x, xsw)] 
converges to the Bayesian risk asymptotically in an ap- 
propriate stochastic setting. For scalar channels and iid 
noise, our expected estimation error E[D{x, xd)] should 
also be asymptotically optimal. The performance in ar- 
bitrary signal processing systems J is an open question. 
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3. ALGORITHMS 

In principle, xmap can be computed by evaluating 
Kolmogorov -based posteriors of |7?.|^ possible sequ 
TZ{x). This is better than continuous estimation, but 
computationally intractable. Instead, we perform 
optimization using Markov chain Monte Carlo (MCI 
17], where U{x) = Hq{x) is the empirical entrof 
blocks of q = 0(log(A^)) symbols of x. 

Markov chain Monte Carlo: We use MCMC 
to approximate xmap^ which is the globally opt 
MAP minimizer To keep things simple, assume 
X e 7?.^ is a candidate estimate. Define the Boltzn 
PDF, 

fs{x) = ^ exp{-s[Hq{x)-log{fY\w{y\w = J{x 

where Hq{x) is the empirical entropy of blocks 
symbols in x [2, 4, 5, 16], q = 0{log{N)) to ensure 
vergence of the empirical entropy to the entropy rate 
s > is inversely related to temperature in an ai 
gous statistical physics heat-bath setting [17], and (, 
normalization constant. To sample from the Boltzn 
PDF (2), we use a Gibbs sampler, in each iteratic 
single element Xn is generated by resampling fron 
PDF, while the rest of x remains unchanged. The 
idea is to reduce temperatures slowly enough for the 
domness of Gibbs sampling to eventually drive MC 
out of any local minimum toward the globally opt 
Xmap- 

Adaptive quantizer: Jalali and Weissman [5] 
used MCMC to approach the fundamental rate disto: 
(RD) limits [6] in lossy compression of binary inj 
For continuous valued (analog) sources [4], using 
data-independent quantizer TZ in MCMC asymptotically 
achieves the RD function universally for stationary er- 
godic continuous amplitude sources. However, TZ grows 
with the input length, slowing down the convergence to 
the RD function, and is thus an impediment in practice. 

To address this issue, we next propose an MCMC- 
based algorithm that uses an adaptive quantizer Q. The 
ground-breaking work by Rose on the discrete nature of 
the Shannon codeboook for iid sources when the Shan- 
non lower bound is not tight [19] suggests that, for most 
sources of practical interest, restriction of the quantizer 
Q to a smaller number of levels does not stand in the way 
of attaining the fundamental compression limits. When 
employed on such sources, our latter algorithm zeroes in 
on the finite quantizer, and thus enjoys rates of conver- 
gence commensurate with the small-quantizer setting. 

Numerical results: In universal lossy compression 
of analog sources [4], we have developed an algorithm 
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Figure 3. Universal lossy compression; Rate R vs. distor- 
tion D of entropy coding [20], results by Yang and Zhang [10], 
average rate and distortion of our universal lossy compression 
algorithna [4], and the RD function [6] for length- 15000 iid 
Laplace inputs, fx{x) = |e~'^'. 

that optimizes the quantizer for square error, and have 
promising preliminary results. Figure 3 compares re- 
sults for an iid Laplace input, /x (2^) — 1^1, achieved 
by entropy coding [20], a deterministic approach by Yang 
and Zhang [10], and our universal MCMC algorithm [4]. 

In our universal compressed sensing work with Duarte 
[16], we focused on development of a fast routine for 
optimizing the quantizer; this routine greatly accelerates 
the algorithm. We have seen in Figure 2 for a source 
with i.i.d. Bernoulli entries with nonzero probability of 
3% that MCMC outperforms i!i-norm minimization, ex- 
cept when the number of measurements AI is low. We 
have additional results, but omit these for brevity; MCMC 
generally estimates the input signal x well, but much 
work remains to be done. 
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